Our project attempts to model and forecast the imports of goods for Florida and Tennessee. More specifically, we are looking at imports of manufactured and non-manufactured commodities based on destination according to the Federal Reserve Bank of St. Louis. Imports play a significant role in the overall economic activity of a state, especially since it is has a direct contribution to GDP. Forecasting import levels of a state provides essential information about the economic growth and stability of the region, and even about the U.S. as a whole.
Our data on imports for Florida and Illinois covers the time period from January 2008 to March 2024, with monthly frequency. The data was sourced from the Federal Reserve Bank of St. Louis.
# Pre-processing
# Import the logged data
FL_raw <- read_csv("IMPTOTFL.csv") # importing Florida
FL_raw <- FL_raw %>%
rename(Month = DATE, Imp = IMPTOTFL)
# Convert to tsibble
FL_tsib <- FL_raw %>%
mutate_at(vars(Month), yearmonth) %>%
tsibble(index = Month)
# Convert to ts object
FL_ts <- ts(FL_raw, start = c(2008, 1), frequency = 12)[,-1]
# auxiliary function to extract dates
nber_to_date <- function(string){
year <- paste(unlist(strsplit(as.character(string), ""))[1:4], collapse = "")
month <- paste(unlist(strsplit(as.character(string), ""))[5:6], collapse = "")
day <- paste(unlist(strsplit(as.character(string), ""))[7:8], collapse = "")
return(paste(sep = "-", year, month, day))
}
# extracting dates from `nberDates()`
recession_df <- tis::nberDates() %>% data.frame() %>%
mutate_at(vars(Start, End), function(x){
as.Date(vapply(x, nber_to_date, character(1)))
})
recessions <- recession_df[32:34,]
## Objects for time period ----------------
### for adding to data frames
# as data frame
Months_df <- bind_cols(year = as.numeric(time(FL_ts)) %/% 1, # extract year
month = round(12 * (as.numeric(time(FL_ts)) %% 1) + 1)) %>% # extract month
mutate(Month = yearmonth(paste(year, month))) %>% # concatenate
dplyr::select(Month)
# data frame for observed months + FORECASTED
Months_pred_df <- bind_rows(Months_df[,1],
data.frame(Month = yearmonth(seq(as.Date("2024-04-01"),
as.Date("2025-03-01"),
length.out = 12)
)))
# as time series
Months_pred_ts <- time(ts(as.Date(Months_pred_df$Month),
start = c(2008, 1), freq = 12))
In the time plot below, we see a general upward trend to the data, with a possible seasonal component. In addition to trend, the time series shows some stochastic cyclic behavior. With respect to ARMA models, the time series exhibits traits of an AR process due to its persistence and its variance being approximately constant overt time.
plot1 <- FL_tsib %>% autoplot(color = "blue") +
labs(title = "FL imports time plot",
subtitle = "with recession bands",
y = "Log-millions of $") +
# NBER recession bands
annotate(geom = "rect", xmin = recessions$Start[2], xmax = recessions$End[2],
ymin = -Inf, ymax = Inf, fill = "grey30", alpha = 0.25) +
annotate(geom = "rect", xmin = recessions$Start[3], xmax = recessions$End[3],
ymin = -Inf, ymax = Inf, fill = "grey30", alpha = 0.25) +
theme_light()
plot1
As far as the ACF and PACF for the data, we see gradual decay in the ACF with very slight spikes at every 12 lags, indicating a both seasonal and non-seasonal AR process. We note that the slow decay in the ACF confirms the strong persistence seen in the time plot.
For the PACF, we see significant spikes at lags 1, 2, 13, and 18 challenging the seasonal AR theory.
plot2 <- ACF(FL_tsib, lag_max = 36) %>% autoplot() + ggtitle("ACF of FL exports") +
theme_light() +
PACF(FL_tsib, lag_max = 36) %>% autoplot() + ggtitle("PACF of FL exports") +
theme_light()
plot2
We observe the positive trend component to the data, agreeing with our description of the trend from earlier. The seasonal component shows strong seasonality which changes shape over time, slightly decreasing in amplitude. There does seem to be a slight pattern to the remainder component, indicating possible cycles. There are clear deviations in the residuals around 2008 and 2021, understandable given the economic climates at the time.
# STL decomposition
FL_stl <- stl(FL_ts, t.window = 24, s.window = 15, robust = TRUE)
plot6 <- FL_stl %>% autoplot(range.bars = FALSE) +
ggtitle("FL STL decomposition") + theme_light()
plot6
We propose the following model: \[\begin{align*} Y_t&=T^{Loess}_t+\sum^{12}_{i=1}\delta_iM_{i,t} + R_t &&\text{Trend and Seasonality} \\ R_t&=\phi R_{t-1}+\varepsilon_t &&\text{Cycles} \end{align*}\]
The first term (\(T^{Loess}_t\)) is
the Loess fitted model for trend, found using an STL decomposition. The
second component of the model captures seasonality using 12 seasonal
dummy variables, one for each month. The parameters for the seasonality
were determined also with in the STL decomposition. We note that we
designed the seasonality to be fixed for the entirety of the data,
i.e. the window for the seasonality was set to "periodic"
so the values for each month do not fluctuate over time. Lastly, we
selected an AR(\(1\)) model for the
cycles, represented with the equation with \(R_t\). We chose the order of AR(1) using
the auto.arima() function, which we also used to determine
the estimates for \(\phi\).
We are unable to extract the explicit parameters used for the Loess trend component, but we can extract the seasonality and cycle components. They are summarized in the following two tables:
# fitting ARMA model
FL_stl2 <- stl(FL_ts, s.window = 'periodic', robust = TRUE)
FL_resid <- remainder(FL_stl2) # remainder component
FL_arma <- auto.arima(FL_resid,
max.P = 0, max.D = 0, max.Q = 0) # no seasonal
# print table of seasonal components
FL_seas <- window(seasonal(FL_stl2), start = c(2008, 1), end = c(2008, 12))
FL_seas_components <- data.frame(month.name, FL_seas) %>%
mutate_at(vars(month.name), function(x){
str_sub(x, start = 1, end = 3)
}) %>%
column_to_rownames("month.name") %>% t()
rownames(FL_seas_components) <- NULL
FL_seas_components_print <- kable(FL_seas_components, digits = 3,
caption = "FL Seasonal Component")
# print table of cycle components
FL_cycl <- data.frame(FL_arma$coef) %>%
rename(Coefficients = FL_arma.coef) %>% t()
colnames(FL_cycl) <- c("AR(1)")
FL_cycl_print <- kable(FL_cycl, digits = 4,
caption = "FL Cycle Component")
# plot model
FL_fitted <- FL_stl2$time.series[,2] + seasonal(FL_stl2) + FL_arma$fitted # fitted values
plot8 <- ggplot() +
geom_line(data = FL_tsib, aes(x = Month, y = Exp), lwd = 0.5) +
geom_line(aes(x = FL_tsib$Month, y = FL_fitted), color = "blue", alpha = 0.7) +
labs(title = "FL time series and model",
subtitle = "data in black, model in blue",
y = "Log-millions of $", x = "Month [1M]") +
theme_light()
FL_seas_components_print
| Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
|---|---|---|---|---|---|---|---|---|---|---|---|
| -0.013 | -0.07 | 0.045 | -0.003 | 0.024 | -0.032 | -0.001 | 0.014 | -0.024 | 0.028 | 0.024 | 0.006 |
FL_cycl_print
| AR(1) | |
|---|---|
| Coefficients | 0.4645 |
To illustrate the fit of the model, the following graph shows the original time series (black curve) and the model (blue curve).
# plot model
FL_fitted <- FL_stl2$time.series[,2] + seasonal(FL_stl2) + FL_arma$fitted # fitted values
plot8 <- ggplot() +
geom_line(data = FL_tsib, aes(x = Month, y = Imp), lwd = 0.5) +
geom_line(aes(x = FL_tsib$Month, y = FL_fitted), color = "blue", alpha = 0.7) +
labs(title = "FL time series and model",
subtitle = "data in black, model in blue",
y = "Log-millions of $", x = "Month [1M]") +
theme_light()
plot8
In the below residual plot, we see fairly constant scattering, with 1 outlier very far from 0. The variance in the residuals seems to be approximately constant, which suggest our model are pretty good. There are no obvious patterns, so our model is relatively successful at capturing the general behavior of the time series.
# data frame with fitted + residuals
FL_res_fitted_df <- data.frame(resid = FL_arma$residuals,
fitted = FL_fitted)
# residual plot
plot10 <- ggplot(data = FL_res_fitted_df, aes(y = resid, x = fitted)) +
geom_point() +
labs(title = "FL model residual plot", y = "Residuals", x = "Fitted") +
geom_hline(yintercept = 0, color = "grey30", lty = 2) + theme_light()
plot10
In the below ACF and PACF, we see close very few statistically significant spikes. We will verify with a statistical test in part (g) if the residuals follow a white noise process.
# ARMA residuals
FL_resid_tsib <- bind_cols(FL_tsib$Month, FL_arma$residuals) %>%
rename(Month = "...1", resid = "...2") %>%
tsibble(index = Month)
# ACF + PACF
plot12 <- ACF(FL_resid_tsib, lag_max = 36) %>% autoplot() + ggtitle("ACF of FL residuals") +
theme_light() +
PACF(FL_resid_tsib, lag_max = 36) %>% autoplot() + ggtitle("PACF of FL residuals") +
theme_light()
plot12
In the CUSUM plot for Florida, we see very little divergence from 0. In fact, the cumulative sum stays well within the red bands, indicating that there are no structural breaks in the model.
plot(efp(FL_arma$resid ~ 1, type = "Rec-CUSUM"))
# diagnostic statistics
FL_MAPE <- MAPE(.resid = FL_resid_tsib$resid, .actual = FL_ts) # MAPE
FL_RMSE <- RMSE(.resid = FL_resid_tsib$resid, .actual = FL_ts) # RMSE
FL_ME <- ME(.resid = FL_resid_tsib$resid) # ME
# Ljung-Box test
FL_ljung <- Box.test(FL_resid_tsib$resid, lag = 12, type = "Ljung-Box")
cat(sep = "", "MAPE: ", FL_MAPE,"\nRMSE: ", FL_RMSE,"\nME: ", FL_ME)
## MAPE: 0.4310564
## RMSE: 0.05325393
## ME: 0.0004696519
FL_ljung
##
## Box-Ljung test
##
## data: FL_resid_tsib$resid
## X-squared = 15.76, df = 12, p-value = 0.2025
coeftest(FL_arma)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.46449 0.06340 7.3264 2.365e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model for Florida has a MAPE of 0.4311, which is below 50%, not terrible for a model. The model’s RMSE is 0.0533 (in log-millions of dollars) – fairly low compared to the scale of the time series (a range from about 7 to 9). It is worth noting that the mean error is 0.0004697, so it is close to but not exactly 0, meaning that our model slightly underestimates the data.
To further assess model appropriateness, a Ljung-Box test using 12 lags was performed. The test had a \(p\)-value of 0.2025; so at the 0.05 significance level we fail to reject the null hypothesis, meaning the residuals follow a white noise process. Therefore our model successfully captured the serial correlation within the time series.
Lastly, we noticed that the ar1 coefficient has a very small P-value, so it’s statistically significant.
Since we used an ARMA model to model the remainder component of the STL decomposition, the residuals of the combined model will be the residuals from the ARMA model. Thus the widths of the prediction interval for the ARMA model will be the widths of the prediction interval for the combined model.
The forecasts of the STL decompositions used the naïve forecasting method, and the ARMA components were forecasted normally.
The below plot has the 12-step-ahead forecast with error bands, starting in April 2024 and ending in March 2025.
# forecasting components
FL_fore_ST <- as.numeric(forecast(FL_stl2, h = 12,
method = "naive")[["mean"]]) # naïve forecast for Trend + Seasonality
FL_fore_arma <- forecast(FL_arma, h = 12) # cycle component forecast
# combining the forecasts into one
FL_fore <- bind_cols(Month = seq(as.Date("2024-04-01"), as.Date("2025-03-01"), length.out = 12),
"estimate" = FL_fore_arma[["mean"]], # point estimate
FL_fore_arma[["lower"]], # lower error bands
FL_fore_arma[["upper"]], FL_fore_ST) %>% # upper error bands
# error bands
rename("lower_80" = "80%...3",
"lower_95" = "95%...4",
"upper_80" = "80%...5",
"upper_95" = "95%...6",
seas_trend = "...7") %>%
mutate_at(vars(estimate:upper_95), function(x) x + FL_fore_ST) %>%
dplyr::select(-seas_trend) %>%
mutate_at(vars(Month), yearmonth)
FL_fore_ts <- ts(FL_fore, start = c(2024, 4), frequency = 12)[,-1]
plot16 <- autoplot(FL_fore_ts[,1], color = "dodgerblue4") +
geom_ribbon(aes(ymin = FL_fore_ts[,3], ymax = FL_fore_ts[,5]),
fill = "dodgerblue2", alpha = 0.5) +
geom_ribbon(aes(ymin = FL_fore_ts[,2], ymax = FL_fore_ts[,4]),
fill = "dodgerblue3", alpha = 0.5) +
geom_line(aes(y = FL_fore_ts[,1]), color = "dodgerblue4", lwd = 0.7) +
labs(title = "FL exports forecast",
subtitle = "with error bands",
y = "Log-millions of $") +
geom_line(data = FL_ts) + xlim(2018, NA) + ylim(7.75, NA) +
theme_light()
plot16
## (i) Comparing to ARIMA Model
FL_arima <- auto.arima(FL_ts)
FL_arima_fcst <- forecast(FL_arima, h = 12) %>% # forecast
data.frame() %>%
rename(forecast = Point.Forecast,
lower.80 = Lo.80,
upper.80 = Hi.80,
lower.95 = Lo.95,
upper.95 = Hi.95)
FL_arima_fcst_ts <- ts(FL_arima_fcst, start = c(2024, 4), freq = 12)
# plotting
plot18 <- autoplot(FL_arima_fcst_ts[,1], color = "dodgerblue4") +
geom_ribbon(aes(ymin = FL_arima_fcst_ts[,4], ymax = FL_arima_fcst_ts[,5]),
fill = "dodgerblue2", alpha = 0.5) +
geom_ribbon(aes(ymin = FL_arima_fcst_ts[,2], ymax = FL_arima_fcst_ts[,3]),
fill = "dodgerblue3", alpha = 0.5) +
geom_line(aes(y = FL_arima_fcst_ts[,1]), color = "dodgerblue4", lwd = 0.7) +
labs(title = "FL exports ARIMA forecast",
subtitle = "with error bands",
y = "Log-millions of $") +
geom_line(data = FL_ts) + xlim(2018, NA) +
theme_light()
plot18
Above, we have a 12-steps-ahead forecast for Florida using an ARIMA model. Compared to the forecast from earlier, the ARIMA forecast shows larger error bands and less volatility in the 12-month forecasted window.
From the summary output below, we see the ARIMA model has a higher
MAPE: \(MAPE_{ARIMA}=0.5708\) compared
to \(MAPE_{T+S+C} = 0.4311\). Further
inspection of the other performance metrics (e.g. ME and RMSE) reveals
that the ARIMA model’s metrics have larger magnitudes than the original
model in general. Thus our original model (STL decomposition with an
ARMA component) performs better than the ARIMA model.
summary(FL_arima)
## Series: FL_ts
## ARIMA(4,1,0)(2,0,0)[12]
##
## Coefficients:
## ar1 ar2 ar3 ar4 sar1 sar2
## -0.3448 -0.1568 -0.1650 -0.1389 0.2301 0.2089
## s.e. 0.0758 0.0754 0.0746 0.0718 0.0754 0.0835
##
## sigma^2 = 0.004578: log likelihood = 249.09
## AIC=-484.17 AICc=-483.57 BIC=-461.3
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.004272695 0.06643662 0.04975053 0.04519978 0.5707926 0.4728538
## ACF1
## Training set 0.00725271
# combining models with regression weighting scheme
FL_combined <- lm(FL_ts ~ FL_fitted + FL_arima$fitted)
FL_combined_resid <- FL_combined$resid
# MAPE comparison
FL_comb_MAPE <- MAPE(.resid = FL_combined_resid, .actual = FL_ts)
FL_arima_MAPE <- MAPE(.resid = FL_arima$residuals, .actual = FL_ts)
cat(sep = '',"T+S+C model MAPE: ", FL_MAPE, '\n',
"ARIMA model MAPE: ", FL_arima_MAPE, "\n",
"Cmb'd model MAPE: ", FL_comb_MAPE)
## T+S+C model MAPE: 0.4310564
## ARIMA model MAPE: 0.5707926
## Cmb'd model MAPE: 0.4314039
We combined the two models (ARIMA model and STL decomposed with ARMA component model) by implementing a rudimentary weighting scheme with a linear regression. When comparing the three models, we observe that the original model (STL decomposed with ARMA component) still has the lowest MAPE. Notably, the combined model performs very similarly to the original model, far better than the ARIMA model.
# 1)
#import data
tn <- read.csv("IMPTOTTN.csv")
#convert to time series
tn_ts <- ts(tn$IMPTOTTN, start = 2008, freq = 12)
a)
#plot time series
tsdisplay(tn_ts)
#plot differenced time series
tsdisplay(diff(tn_ts))
#Because the original time series looks non-stationary, I will use the differenced time series for ease of modeling.
b)
stl <- stl(diff(tn_ts), s.window="periodic", robust=TRUE)
autoplot(stl) +
ggtitle("TN STL Decomposition")
The STL decomposition shows that there is no trend. There is seasonality since the seasonal component looks consistently strong over time. There are cycles since the remainder component shows evidence of cyclical patterns.
c)
#tsdisplay(diff(tn_ts))
plot(diff(tn_ts), col='gray')
model=Arima(diff(tn_ts),order=c(2,0,0),seasonal=list(order=c(1,0,1)))
lines(model$fitted,col="red")
Based on the tsdisplay in a), I constructed a model by analyzing the ACF/PACF plots of the differenced time series. There are significant spikes at the seasonal lags of the acf and pacf plots, signalling that seasonal AR(1) and seasonal MA(1) model would be appropriate. Additionally, significant spikes at lower order lags 1 and 2 of the pacf plot suggest an AR(2) model would be appropriate to capture the cycles.
e)
plot(model$fitted, model$res)
There is a lot of scatter and no clear patterns in the residuals vs. fitted values plot. Most values center around the mean of 0, with the presence of several outliers.
f)
par(mfrow=c(2,1))
acf(model$res, lag=100, main="ACF of Residuals")
pacf(model$res, lag=100, main="PACF of Residuals")
par(mfrow=c(1,1))
We hardly see any significant spikes in both the ACF and PACF plots of the residuals. There are a few lags that slightly cross the bands, but for the most part, we have gotten rid of the dynamics present in the residuals.
g)
plot(efp(model$res~1, type ="Rec-CUSUM"))
The model does not break at any point and stays within the bands. There appears to be negative drift around 2009, then positive drift afterwards. Overall, values tend to be around the mean of 0.
h)
coeftest(model)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.4657432 0.0702738 -6.6276 3.413e-11 ***
## ar2 -0.2196448 0.0704367 -3.1183 0.001819 **
## sar1 0.9703711 0.0223859 43.3474 < 2.2e-16 ***
## sma1 -0.8001186 0.0730599 -10.9515 < 2.2e-16 ***
## intercept 0.0029141 0.0091404 0.3188 0.749869
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For all the coefficients, the p-values are statistically significant. The output confirms the significant of the autoregressive and seasonal terms in the model.
Box.test(model$res, lag=12, type="Ljung-Box")
##
## Box-Ljung test
##
## data: model$res
## X-squared = 14.14, df = 12, p-value = 0.2918
Since the p-value = 0.29 for the Ljung-Box test, we fail to reject the null and conclude that the residuals exhibit a white noise fit and captures the data’s serial correlation.
qqnorm(as.numeric(model$res), main = "QQPlot")
qqline(model$res, col = "red")
The QQ plot shows that the residuals tend to approximate a normal distribution, which is desirable.
i)
model=Arima(diff(tn_ts),order=c(2,0,0),include.drift=TRUE,seasonal=list(order=c(1,0,1)))
plot(forecast(model,h=12),shadecols="oldstyle", main = "Model Forecast")
j)
fit=auto.arima(diff(tn_ts))
plot(forecast(fit,h=12),shadecols="oldstyle")
mape(model$res, diff(tn_ts))
## [1] 2.459124
mape(fit$res, diff(tn_ts))
## [1] 6.345477
My model performs better because the mean absolute percentage error is approximately 2.45%, which is lower than the 6.34% for the auto.arima model.
k)
#generate forecasts
forecast_model <- forecast(model, h = 12)
forecast_fit <- forecast(fit, h = 12)
#get the forecasted means from each model
forecast_means_model <- forecast_model$mean
forecast_means_fit <- forecast_fit$mean
#calculate the average forecast at each forecast horizon
average_forecast <- (forecast_means_model + forecast_means_fit) / 2
#get the actual values for the forecast horizon of 12 steps ahead
actual_values <- diff(tn_ts)[1:12]
#calculate mape for the combined forecast
mape_combined <- mape(average_forecast, actual_values)
mape_combined
## [1] 2.176244
The combined forecast MAPE of 2.17% is better than the MAPE for the auto arima model and my model. This suggests that the combined forecast outperforms the two models individually.
# Combine into one data frame
y=cbind(FL_ts,tn_ts)
y_tot=data.frame(y)
# Fit VAR model
VARselect(y_tot, lag.max = 20)
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 13 13 3 13
##
## $criteria
## 1 2 3 4 5
## AIC(n) -1.022210e+01 -1.050801e+01 -1.063004e+01 -1.062614e+01 -1.069179e+01
## HQ(n) -1.017808e+01 -1.043465e+01 -1.052734e+01 -1.049410e+01 -1.053041e+01
## SC(n) -1.011359e+01 -1.032716e+01 -1.037686e+01 -1.030062e+01 -1.029393e+01
## FPE(n) 3.635825e-05 2.731772e-05 2.418066e-05 2.427743e-05 2.273835e-05
## 6 7 8 9 10
## AIC(n) -1.074965e+01 -1.074499e+01 -1.073145e+01 -1.080500e+01 -1.083370e+01
## HQ(n) -1.055892e+01 -1.052493e+01 -1.048204e+01 -1.052624e+01 -1.052561e+01
## SC(n) -1.027945e+01 -1.020246e+01 -1.011658e+01 -1.011779e+01 -1.007415e+01
## FPE(n) 2.146475e-05 2.157120e-05 2.187378e-05 2.033275e-05 1.976934e-05
## 11 12 13 14 15
## AIC(n) -1.086146e+01 -1.091281e+01 -1.101049e+01 -1.099208e+01 -1.095162e+01
## HQ(n) -1.052403e+01 -1.054604e+01 -1.061437e+01 -1.056661e+01 -1.049682e+01
## SC(n) -1.002958e+01 -1.000859e+01 -1.003393e+01 -9.943176e+00 -9.830386e+00
## FPE(n) 1.924221e-05 1.829507e-05 1.660973e-05 1.693875e-05 1.766243e-05
## 16 17 18 19 20
## AIC(n) -1.091792e+01 -1.093189e+01 -1.091326e+01 -1.091697e+01 -1.096845e+01
## HQ(n) -1.043377e+01 -1.041840e+01 -1.037042e+01 -1.034479e+01 -1.036693e+01
## SC(n) -9.724341e+00 -9.665976e+00 -9.575005e+00 -9.506377e+00 -9.485522e+00
## FPE(n) 1.829686e-05 1.807538e-05 1.845258e-05 1.842590e-05 1.754546e-05
y_model=VAR(y_tot,p=13)
plot(y_model)
Since AIC, HQ, and FPE all suggest a lag of 13, the order is fixed on the thirteenth lag. Residuals reverting to zero in a VAR model indicate that the residuals exhibit properties of a white noise process which is a desirable characteristic for a well-specified VAR model.
plot(irf(y_model, n.ahead=36))
The impact of a shock to imported goods in Florida has a positive and relatively constant effect on imported goods in Tennessee. The impact of a shock to imports in Tennessee has a very minimal impact on imported goods in Florida.
grangertest(tn_ts ~ FL_ts, order = 13)
## Granger causality test
##
## Model 1: tn_ts ~ Lags(tn_ts, 1:13) + Lags(FL_ts, 1:13)
## Model 2: tn_ts ~ Lags(tn_ts, 1:13)
## Res.Df Df F Pr(>F)
## 1 155
## 2 168 -13 3.5766 6.396e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
grangertest(FL_ts ~ tn_ts, order = 13)
## Granger causality test
##
## Model 1: FL_ts ~ Lags(FL_ts, 1:13) + Lags(tn_ts, 1:13)
## Model 2: FL_ts ~ Lags(FL_ts, 1:13)
## Res.Df Df F Pr(>F)
## 1 155
## 2 168 -13 1.5387 0.1093
Changes to imports in Florida appear to cause changes to imports in Tennessee as the p-value is very close to zero, but changes to imports in Tennessee do not cause changes to imports in Florida as the p-value exceeds 0.05.
var.predict=predict(object=y_model, n.ahead=12)
plot(var.predict)
plot(stability(y_model, type = "Rec-CUSUM"), plot.type="single")
The VAR forecast shows minimal variation in the data over the next 12 periods, similar to the ARIMA and ARMA forecasts.
The recursive CUSUM plots for both Florida and Tennessee indicate that the VAR model is stable. While Florida’s plot shows a slight upward trend, it remains within the control limits (red bands). Tennessee’s plot stays close to zero, further supporting the model’s stability.
For our project, we built four models to predict imports in Florida and Tennessee: one with trend, seasonality, and cyclical components; an ARIMA model; a combined model; and a VAR model.
The first model was an STL decomposition for trend and seasonal components and an ARMA model for cycles. We then fit an ARIMA model using auto.arima and a combined model. For Florida, the original model outperformed the ARIMA and combined models, but for Tennessee, the combined model performed best. Our final model involved fitting a VAR model of order 13, and we performed Granger-Causality tests to determine that Florida imports “cause” Tennessee imports but not the other way around.
To improve our modeling in the future, we could have tried a changing weighting scheme for the combined model to see if the performance would have improved.
U.S. Census Bureau, Imports of Goods for Florida [IMPTOTFL], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/IMPTOTFL, May 17, 2024.
U.S. Census Bureau, Imports of Goods for Tennessee [IMPTOTTN], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/IMPTOTTN, May 17, 2024.